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Abstract. By combining recent advances in Natural Language Processing and 
Conversational Agent (CAs), we suggest a new form of human-computer 
interaction for individuals to receive formative feedback on their argumentation 
to help them to foster their logical reasoning skills. Hence, we introduce 
ArgueBot, a conversational agent, that provides adaptive feedback on students’ 
logical argumentation. We, therefore, 1) leveraged a corpus of argumentative 
student-written peer-reviews in German, 2) trained, tuned, and benchmarked a 
model that identifies claims, premises and non-argumentative sections of a given 
text, and 3) built a conversational feedback tool. We evaluated ArgueBot in a 
proof-of-concept evaluation with students. The evaluation results regarding 
technology acceptance, the performance of our trained model, and the qualitative 
feedback indicate the potential of leveraging recent advances in Natural 
Language Processing for new human-computer interaction use cases for scalable 
educational feedback. 


Keywords: Argumentation Learning, Argumentation Mining, Pedagogical 
Conversational Agents 


1 Introduction 


In today’s world most information is readily available and the importance of the 
ability to reproduce information is decreasing. This results in a shift of job profiles 
towards interdisciplinary, ambiguous and creative tasks [1]. Thus, educational 
institutions are asked to evolve in their curricula when it comes to the compositions of 
skills and knowledge conveyed [2]. Especially teaching higher order thinking skills to 
students, such as critical thinking, collaboration or problem-solving, have become more 
important [3]. This has already been recognized by the Organization for Economic Co- 
operation and Development (OECD), which included these skills as a major element of 
their Learning Framework 2030 [4]. One subclass represents the skill of arguing in a 
structured, reflective and well-formed way [5]. Argumentation is not only an essential 
part of our daily communication and thinking but also contributes significantly to the 
competencies of communication, collaboration and problem-solving [6]. Starting with 
studies from Aristoteles, the ability to form convincing arguments is recognized as the 
foundation for persuading an audience of novel ideas and plays a major role in strategic 
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decision-making and analyzing different standpoints especially in regard to managing 
digitally enabled organizations. To develop skills such as argumentation, it is of great 
importance for the individual to receive continuous feedback throughout their learning 
journey, also called formative feedback [7, 8] However, universities, face the challenge 
of providing individual learning conditions, since every student would need a personal 
tutor to have an optimal learning environment to learn how to argue [9]. However, this 
is naturally hindered due to traditional large-scale lectures or due to the growing field 
of distance learning scenarios such as massive open online courses (MOOCs) [10]. 

A possible solution avenue lies in leveraging Natural Language Processing (NLP) 
and Machine Learning (ML) to provide students with adaptive and ongoing feedback, 
e.g., on texts and instant messages by a Conversational Agent (CA) [11] and thus 
provide them access to formative argumentation feedback. CAs are software programs 
that communicate with users through natural language interaction interfaces [12, 13]. 
The successful application of CAs to meet individual needs of learners and to increase 
their learning outcomes has been demonstrated for learning various skills such as 
problem-solving skills [14], programming skills [15], mathematical skills [16] as well 
as for learning factual knowledge [17], and also offers potential for training 
argumentation skills. A possible solution to provide adaptive support of argumentation 
could be the utilization by argumentation mining, a proven approach to identify and 
classify argumentation in texts. The potential of argumentation mining has been 
investigated in different research domains, such as automated skill learning support for 
students [18-20], accessing argumentation flows in legal texts [21], better 
understanding of customer opinions in user-generated comments [22], or fact-checking 
and de-opinionizing of news [23]. Hence, we suggest that the advantages of 
argumentation mining could be leveraged to design a new form of human-computer 
interaction for individual to receive scalable formative argumentation feedback [24]. In 
fact, Lippi and Torroni (2016) [25] and Chernodub et al. (2019) [26] designed static 
argumentation web interfaces that can be used to automatically identify and classify 
argumentation components from English input texts. However, these tools fall short to 
provide an educational embedding, a user-friendly form of human-computer 
interaction, e.g., through a conversational interface, and lack application for German 
content, since they were trained on English corpora. Therefore, we aim to close this gap 
by designing a CA that can be used by individual learners to receive formative 
argumentation feedback, e.g., while writing an argumentative text. Overall, we aim to 
contribute to research and practice by answering the following research question: 


RQ: How do students perceive a conversational agent which provides adaptive 
argumentation feedback on a given text based on Argumentation Mining? 


To tackle the research question, we develop a CA in Slack! (Slack bot) called 
ArgueBot (short for Argumentation Bot), which provides students feedback on the 
logical argumentation of given texts on the baseline of existing theory (cognitive 
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dissonance based on [27]). We believe cognitive dissonance theory could explain why 
formative feedback on an individual’s argumentation will motivate the individual to 
learn how to argue. With adaptive formative feedback, we implicate a CA which 
provides individual and real-time feedback to individuals on a given text. 

We follow the Cross Industry Standard Process for Data Mining (CRISP-DM) 
Model [28] to build a novel model that identifies argumentation flaws in texts. For 
training our argumentative feedback model, we leverage the German business model 
feedback corpus of [29]. We develop a novel modelling approach to identify 
argumentative structures and build a CA in Slack, which provides individuals with 
formative argumentation feedback based on given textual messages. To answer our 
research question, we evaluate ArgueBot in a first proof-of-concept evaluation, where 
we ask students to write an argumentative text and use our CA to receive feedback on 
the argumentation structure. The measured technology acceptance, the positive 
qualitative feedback and the strong performance of our model compared to state-of the 
art approaches suggest using new forms of conversational human-computer interaction 
based on NLP to provide students with argumentation support in different scenarios. 

The remainder of the paper is structured as follows: First, we provide the necessary 
conceptual background on argumentation learning, argumentation mining and CAs 
based on a systematic literature review [1]. Next, we present our CRISP-DM 
methodology in section three and explain the implementation and evaluation of our 
model and ArgueBot in section four. Finally, we present and evaluate our results, 
followed by a discussion about the limitations and contributions of our study. 


2 Conceptual and Theoretical Background 


2.1 Adaptive Argumentation Learning 


As Kuhn (1992) [6] states, the skill to argue is of great significance not only for 
professional purposes like communication, collaboration and for solving difficult 
problems but also for most of our daily life: “Jt is in argument that we are likely to find 
the most significant way in which higher order thinking and reasoning figure in the 
lives of most people. Thinking as argument is implicated in all of the beliefs people 
hold, the judgments they make, and the conclusions they come to; it arises every time a 
significant decision must be made. Hence, argumentative thinking lies at the heart of 
what we should be concerned about in examining how, and how well, people think’? 
([6], pp. 156-157). However, teaching argumentation is limited. [30] identified three 
major causes for that: “teachers lack the pedagogical skills to foster argumentation in 
the classroom, so there exists a lack of opportunities to practice argumentation; 
external pressures to cover material leaving no time for skill development; and deficient 
prior knowledge on the part of learners”. Therefore, many authors have claimed that 
fostering argumentation skills should be assigned a more central role in our educational 
system [31, 32]. Adaptive support approaches for argumentation learning (e.g., 15, 30- 
32) describe a rather new field of argumentation learning supported by IT-based 
systems. The aim is to provide pedagogical feedback on a learner’s action and solutions, 


hints and recommendations to encourage and guide future activities in the writing 
processes or automated evaluation to indicate whether an argument is syntactically and 
semantically correct. However, the combination of NLP, ML and pedagogically 
evaluated formative feedback in a student’s learner journey is merely investigated due 
to high complexity. As Scheuer (2015) identifies, “rigorous empirical research with 
respect to adaptation strategies is almost absent; a broad and solid theoretical 
underpinning, or theory of adaptation for collaborative and argumentative learning is 
still lacking” [36]. Therefore, we aim to address this research gap and design an easy- 
to-use argumentation learning tool based on a conversational human-computer 
interaction design. Thus, we built on the application of recent developments in NLP 
and ML, in which argumentation mining has been a proven approach to identify and 
analyze argumentative structures of a given text in real time [19, 29, 37, 38]. 


2.2 Argumentation Mining 


The foundation of argumentation mining is argumentation theory. Argumentation 
theory is about analyzing the structure and the connection between arguments. One of 
the most prominent argumentation models is the Toulmin model [39]. Toulmin’s model 
asserts that a "good" argument involves a logical structure built on ground, claim and 
warrant, whereas the grounds are the evidence used to prove a claim. Walton et al. 
(2008) [40] developed the so-called “argumentation schemes” that use the Toulmin’s 
type of reasoning. It is commonly considered that “Claim”, “Premise”, and “Warrant” 
are the main components of every argument, and the rest are supporting sub-argument 
parts that may or may not exist in an argument. Argumentation mining itself aims to 
identify these components of an argumentation model with NLP and ML. It falls under 
the category of computational argumentation, which encompasses a variety of tasks. 
These tasks include identifying the argumentation style [41], in which arguments are 
classified as "factual" or “emotional” in order to understand the characteristics better, 
identifying the reasoning behind the stance of the author by creating a classifier using 
the stance classification [42], identifying arguments to be used as summarization 
pointers [43] or ranking arguments according to how convincing they are using a joint 
model with one deep learning module in it [44]. Following [45], the most related 
subtasks of argumentation mining can be summed up as: 


e Argument Identification, which is concerned with identifying the 
argumentative parts in raw text and setting up its boundaries versus a non- 
argumentative text. 


e Argument component classification, which is the subtask of finding out 
what the primary purpose is to classify the components of the argument 
structure. Classifying an argumentative text into claims or premises is one 
popular way of tackling the target of this subtask. 


e Argumentative discourse analysis, during this subtask, the researcher tries 
to identify the discourse relations between the various components existing in 
the argument. A typical example of this subtask is the identification of whether 
a support or an attack relationship exists between the claim and the premise. 


In our study we are focusing on the challenges of argument identification and 
argument component classification since these are usually the first two steps of an 
argumentation mining architecture and, thus, the foundation of every argumentation 
mining application. The potential of argumentation mining has been investigated in 
different research domains. However, it has merely been leveraged to build a 
conversational learning tool to provide individuals with formative and adaptive 
argumentation feedback on a commonly available communication platform such as 
Slack [20, 37]. 


2.3 Adaptive Argumentation Feedback Tools and Conversational Agents 


Although argumentation mining is a growing field of research and many studies have 
been conducted, only very few practical tools exist that provide individuals — non- 
programmers — with access to this technology. In a systematic literature review, we 
found only two tools available that provide individuals with access to argumentation 
mining. Lippi and Torroni (2016) [25] developed the first online argumentation mining 
tool that was made available for a broad audience. Their tool, MARGOT, is available 
as a web application and processes a text that is input in the corresponding editor field. 
After processing and analyzing the text, the results are displayed on the user interface. 
Claims are displayed in bold font, whereas premises are displayed in italic style [25]. 
The second and most recent tool that provides individuals with access to argumentation 
mining was published by [26]. Similar to MARGOT, in their system called TARGER 
a user can analyze the argumentative structure of an input text. The results are then 
presented below the input, whereas claims are highlighted in red and premises are 
marked in green. However, both approaches are only available in English, not designed 
from an educational perspective and thus not necessarily easy-to-use and easy-to-access 
for students in their learning journey, since a user would always have to open the 
website, select a certain model and then copy his or her text into the input field. 
Therefore, we suggest building a novel conversational interface for leveraging 
argumentation mining for argumentation feedback for students. 

CAs are software programs that are designed to communicate with users through 
natural language interaction interfaces [12, 13]. In today’s world, conversational agents, 
such as Amazon’s Alexa, Google’s Assistant or Apple’s Siri, are ubiquitous, with their 
popularity steadily growing over the past few years [46, 47]. They are implemented in 
various areas, such as customer service [11], collaboration [48], counselling [49] or 
education (e.g., [50, 51]). Hobert and Wolff (2019) [52] define CAs used in education 
as a special form of learning application that interacts with learners individually. We 
believe that a CA can offer new forms of providing argumentative feedback to students 
through a conversation interface. Therefore, we aim to tackle the challenge of adaptive 
argumentation support to individuals by building a novel model based on 
argumentation mining and prototype a conversational interface on an open platform 
such as Slack. 


2.4 Cognitive Dissonance Theory 


We built our research project on cognitive dissonance theory. We believe that this 
theory might supports our underlying hypothesis that individual and personal feedback 
on a individuals’ argumentation motivates the individual to improve her skill level. 
Cognitive dissonance refers to the unsatisfying feeling that occurs when there is a 
conflict between one’s existing knowledge and contradicting presented information 
[27]. This uncomfortable internal state results in a high motivation to solve this 
inconsistency. According to Festinger’s theory, an individual experiencing this 
dissonance has three possible ways to resolve it: change the behavior, change the belief 
or rationalize the behavior. Especially for students in a learning process, dissonance is 
a highly motivating factor to gain and acquire knowledge to actively resolve the 
dissonance [53]. It can be an initial trigger for a student’s learning process and thus the 
construing of new knowledge structures [54]. However, the right portion of cognitive 
dissonance is very important for the motivation to solve it. According to Festinger, 
individuals might not be motivated enough to resolve it if the dissonance is too obvious, 
whereas a high level of dissonance might lead to frustration. Therefore, we believe that 
the right level of feedback on a student’s skill through a conversational interface on an 
open communication platform, could lead to cognitive dissonance and thus to 
motivation to change the behavior, belief or knowledge to learn how to argue [55]. 


3 Research Methodology 


To answer our research question, we develop an artifact following the Cross Industry 
Standard Process for Data Mining (CRISP-DM) Model, which is illustrated in Figure 
1 [28]. The model describes a standardized approach for Data Mining problems from a 
practical point of view, followed by the data understanding, the data preparation, and 
the data modelling. 

Our approach is divided into five iterative stages. In the first stage, we analyzed the 
current state of argumentation learning and argumentation mining achievements in 
literature based on a systematic literature review [1]. Second, we investigated different 
corpora and their results at the current state in argument classification across multiple 
domains. We built on the corpus of [29] since it fulfills our requirements of a large data 
set, a rigorous annotation study and has been successfully used to provide students 
adaptive argumentation feedback [19]. Third and fourth, we built a model using NLP 
and Deep Learning (DL) algorithms to classify a text piece as a claim, premise or non- 
argumentative following the model of [38, 39, 56]. We iteratively evaluated the 
performance of our model and revised it based on various performance metrics such as 
the fl-score. In a fifth step, the model is deployed as the back-end algorithm of our 
conversational agent ArgueBot using the Slack communication platform. We chose 
Slack since it is a team communication platform widely used with a strong user growth 
[57]. Moreover, Slack offers an easy-to-use API for building CAs. We believe that a 
Slack bot might lower the barrier for students to use such a feedback tool in their daily 
learning journey. Finally, to we evaluate ArgueBot in a proof-of-concept evaluation, 
following the evaluation patterns of artefacts of Sonnenberg and vom Brocke (2012). 


Business Data 
Understanding Understanding 
Data 
Preparation 


Modeling 


Deployment 


Evaluation 


Figure 1. Cross Industry Standard Process for Data Mining (CRISP-DM) [28] 


Our approach is developed using the programming language Python 3.7 for Machine 
Learning (ML) applications, since it is widely known, easy to use and supports major 
libraries for NLP and ML tasks. For Deep Learning, TensorFlow and its integrated 
Keras [59] are called. Additionally, the framework FARM by deepseť was used. 


4 Implementation and Evaluation of ArgueBot 


To answer our research question, we aim to build a CA that provides adaptive 
feedback on the argumentation structure of a given text input by identifying claims, 
premises and non-argumentative sections. In order to accomplish this, we propose to 
train a model on a transfer learning approach based on Deep Bidirectional Transformers 
for Language Understanding (BERT) as seen in [60]. The first phase of CRISP-DM, 
business understanding, is explained in the introduction and the theoretical background 
of this work. 


4.1 Data Understanding and Data Preparation 


For training our argumentative feedback model, we leverage the German business 
model feedback corpus of [29] since it provides a) a large data base of 1000 
argumentation annotated student texts in German language, is b) annotated based on 
the argumentation theory of Toulmin [39] and c) provides a rigorous annotation study 
with a moderate agreement. We split the corpus’ texts into tokens and assigned the 
corresponding label. In our case, the label includes two parts. The first part represents 
the JOB-encoding following [61]. The JOB-encoding indicates whether a token is the 
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beginning of an argument component (“B” = beginning), is covered by an argument 
component (“TI = inside) or not included in any argument component (“O” = outside). 
Additionally, we include the specific argument type in the label, which results in labels 
such as “B-Premise” (standing for a token being the beginning of a premise) or “/- 
Claim” (standing for a token being inside a claim). Since several authors reached 
satisfying results with bidirectional Long-Short-Term-Memory-Conditional-Random- 
Fields classifiers (BiLSTM-CRF), we started with such a modelling architecture [26, 
62]. To prepare the data accordingly, the tokens are replaced by their index in the 
corresponding word embeddings vocabulary (GloVe) [63]. However, since we did not 
receive sufficient accuracy in the further refinement of our model, the architecture was 
switched to the Bidirectional Encoder Representations from Transformers (BERT) 
proposed by Devlin et al. (2018) [60] in a second modelling cycle (see section 4.2). 
Therefore, the tokens are further split into word pieces to fulfill the preparation 
requirements for BERT. This also requires providing additional labels for the word 
piece endings. Finally, the data are transformed to PyTorch Tensors*, which represent 
multi-dimensional matrices containing a single data type, in order to match the model’s 
input requirements for the used framework. The special preprocessing for BERT was 
conducted by utilizing the tokenizer and processor provided by the FARM framework 
from deepset. 


4.2 Iterative Modelling: Identifying Claims and Premises 


Token 1 

z 

o 

3 

eA 

a 

cA Token 2 

Corpus of German Preprocessing “ German 
ersuasive texts 
‘ BERT Model 

Token 3 


Figure 2. Overview of the model architecture based on BERT [60] 


The goal of our model is to provide accurate predictions to identify and classify 
argument components that can be used for an automated argument feedback system. 
We split the data into 70% training, 20% validation and 10% test data. We iteratively 
developed our model in two cycles. In the first cycle, the current state-of-the-art model, 
a bidirectional LSTM-CRF classifier with GloVe embeddings input, was created 
following the approach for persuasive essays of [62]. However, since we only reached 
an unsatisfying fl-score of 57 percent, we decided to follow a more novel transfer 
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learning approach in the second modelling interaction. In fact, we decided to change 
the architecture of our model to the Bidirectional Encoder Representations from 
Transformers (BERT) proposed by Devlin et al. (2018) [60]. BERT has been 
successfully used to model argumentations across different domains [38]. A German 
BERT model is available through the framework FARM. For the proposed architecture, 
the inputs and outputs are adapted to the sequence classification task of argument 
component identification and classification. The last hidden layer is a Recurrent Neural 
Network with 512 nodes that takes the BERT output and learns to feed into a sigmoid 
layer that classifies each token according to the predicted label. Figure 2 illustrates the 
basic architecture of our model following BERT. 

The proposed model was finetuned in several iterations and the best performing set 
of hyperparameters included a learning rate of Se’, a warmup and embedding dropout 
probability of 0.1 and 0.15 respectively. As presented in Table 2, BERT clearly 
outperforms other state-of-the-art model architectures for argumentation mining for 
persuasive essays (e.g., [62]). Whereas the bidirectional LSTM-CRF classifier achieved 
a macro fl-score of 0.57, the German BERT model performed about 28% better, 
reaching a macro fl-score of 0.73 on the token classification task on the German student 
written business model review corpus. 


Table 2. Overview of overall performance of the BILSTM-CRF and BERT 


Model Precision Recall F1-Score 
BiLSTM-CRF 0.60 0.55 0.57 
BERT 0.74 0.72 0.73 


After a set of iterative refinements, we achieved satisfying results for the 
classification of argumentative components of the texts using the BERT model of 
deepset. The results are stated in Table 3 for the different labels. 


Table 3. Overview of results of BERT for the classification of argumentative components 


Label Precision Recall F1-Score 
B-Claim 0.66 0.74 0.69 
I-Claim 0.72 0.62 0.66 

B-Premise 0.66 0.75 0.70 
I-Premise 0.75 0.68 0.72 
(0) 0.91 0.82 0.86 


4.3 Deployment: Building ArgueBot - a CA for Adaptive Argumentation 
Feedback 


In order to contribute to our research question, the next step incorporated to deploy 
our trained model in a conversational interface so that students can receive formative 
feedback on their argumentation structure, e.g., when writing an argumentative essay. 
Therefore, the trained model was exported and implemented in the back-end of our CA 
ArgueBot which identifies and classifies argument components and provides feedback 


on the argumentative structure. We chose Slack as a platform for ArgueBot since it is a 
team communication platform widely used by many large organizations. We believe 
that a CA in Slack (Slack bot) might lower the barrier for using such a feedback system 
in daily learning activities. A screenshot of our final CA in Slack is illustrated in Figure 
3. 

The functionality and usability are kept rather simple so that students can access our 
CA without any pre-knowledge or onboarding. The user can enter a message and send 
it to ArgueBot, e.g., an argumentative essay or an important email (see Figure 3: 1). 
This message is then sent to our trained model. Claims, premises and non- 
argumentative tokens are being classified and sent back to the frontend. Following [25], 
claims are then displayed in bold font, whereas premises are displayed in italic style 
(see Figure 3: 2). Besides, ArgueBot is providing a short summarizing feedback based 
on the number of premises and claims in the message (see Figure 3: 3). For example, if 
the message contains less than two premises or contains more claims than premises, the 
user receives a corresponding feedback indicating that the argumentation is not 
sufficient. We believe that this individual and personal feedback on an individual’s 
argumentation motivates the individual to improve her skill level. 
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Figure 3. Screenshots of our CA ArgueBot providing argumentative feedback to a student 


4.4 Proof-of-Concept Evaluation of ArgueBot 


In order to answer our research question and evaluate the feasibility of ArgueBot we 
performed a proof-of-concept evaluation following the evaluation patterns of technical 
artifacts of [58]. Hence, our aim was to investigate how do students perceive a 
conversational agent which provides adaptive argumentation feedback on given texts. 
Therefore, we performed a qualitative and a quantitative evaluation. We designed a 
laboratory experiment, where we asked students to write an argumentative text 
message, e.g., a statement about a business model. The participants were told that the 


test message should contain at least three sentences (around 50 words). We asked the 
participants to use ArgueBot to receive feedback on the argumentation structure of their 
text and revise it accordingly, if wished. After that the participants were asked to 
conduct a post-survey. We captured the perceived usefulness, intention to use and ease 
of use following the technology acceptance model of [64, 65]. Exemplary items for the 
three constructs are: “Imagine the feedback tool would be available in your daily 
working life, I am encouraged to use it.", "I would find the feedback tool useful for 
receiving feedback on my argumentation.", “Using the feedback tool would allow me 
to write more argumentative messages.” or “The feedback tool is easy to use.” All 
these items were measured with a 7-point Likert scale (1: strongly disagree to 7: 
strongly agree, with 4 being a neutral statement). In addition, we collected demographic 
information, such as age, gender and occupation. Finally, the participants were asked 
to provide qualitative feedback regarding the strengths, weaknesses and suggestions for 
useful additional features. 

In total, ten users participated in our experiment; eight male and two females, who 
were between the ages of 22 and 25. Four participants were software development 
trainees working in a software company, six participants were business students on 
master level. The quantitative evaluation of the results showed that the CA is considered 
to be easy to use with an overall score of 6.33. The intention to use and the perceived 
usefulness were rated similar positively, with total scores of 5.5 and 5.63 respectively. 
A positive technology acceptance is especially important for learning tools to ensure 
students are perceiving the usage of the tool as helpful, useful and easy to interact. This 
will foster motivation and engagement to use the learning application. The perceived 
usefulness and intention to use provides promising results to use this tool as a feedback 
application in different learning settings. 

Moreover, we performed a qualitative evaluation by asking the participants to 
provide a more detailed feedback on what they particularly liked or disliked about the 
tool. We clustered the answers to form a more concise feedback. Based on that, it seems 
that the short adaptive feedback on the argumentation structure is highly appreciated. 
The participants mentioned several things about the conversational interaction, such as 
about the format of the feedback (“The formatting of the feedback is very clear’), the 
reaction time of the CA (“The fast reaction of the system (feedback time) is pleasant”), 
the differentiation of claims and premises (“7 like the division of my statement into 
Claim and Premise”) or the overall feedback system (“The tool really helps to build 
up a meaningful argumentation structure, it recognized my rather bad argumentation 
immediately”). However, three of ten participants mentioned that the tool did not find 
the right argument components. Further, one participant also criticized the speed and 
the representation of the feedback, since the analysis takes too long and especially the 
premises (italic) are not clearly highlighted. Besides, several suggestions have been 
made for further development of such a feedback system, such as clear suggestions to 
improve the argumentation, spelling checks, scoring or a more detailed description of 
claims and premises. 


5 Discussion and Conclusion 


In our paper, we aimed to investigate if a conversational agent based on a novel 
modelling approach might provide a new form of human-computer interaction for 
formative argumentation skill feedback. To answer our research question, we develop 
a new argumentation classification pipeline based on the current state of transfer 
learning to build a CA that provides students with formative feedback on their 
argumentation, e.g., when writing an argumentative essay. We built on an existing 
corpus of argumentation annotated student texts in German to develop a novel 
modelling approach to identify argumentative structures and build a CA in Slack, which 
can be used by students in their daily learning activities to receive adaptive and ongoing 
feedback independent of an instructor, time and location. To evaluate how do students 
perceive a conversational agent which provides adaptive argumentation feedback on 
given text, we performed a first proof-of-concept evaluation, where we asked 
participants students to write an argumentative text and to use our CA for receiving 
adaptive feedback. Our results indicate that students would intent to use a 
conversational argumentation feedback tool. Moreover, the measured perceived 
usefulness and perceived ease of use provides evidence that this new form of human- 
computer interaction might help to leverage recent advantages in NLP to provide 
students with writing support through a conversational interface. In order to 
successfully use a learning tool in a real-world scenario, positive technology acceptance 
is very important to ensure students perceive the usage of the tool as helpful, useful and 
easy to interact with. This will foster motivation and engagement to use the learning 
application. 

We build our research on cognitive dissonance theory [27]. We argue that a learning 
tool for argumentation skills (and possibly also meta cognition skills) like our CA 
increases the motivation of students to learn how to apply the certain skills, for example, 
learn how to argue, and thus improve the learning outcome. For example, our ArgueBot 
which provides instant and individual feedback should increase the individual’s 
motivation to resolve dissonance and therefore construct new knowledge. This goes 
along with other studies on adaptive argumentation support in the literature of HCI 
research (e.g., [19, 55]). 

Thereby, our study contributes to two different research areas in information 
systems: first, we contribute to new forms of human-computer interaction in adaptive 
education, suggesting a use case to employ a conversatioanl interaction with potential 
benefits for educational institutions and organizations to foster adaptive and on-going 
skill feedback. We show how an exemplary case, based on a CA and a novel NLP 
model, can be leveraged to provide individual support and feedback to enable students 
to receive feedback on a certain skill. Second, we contribute to the field of digital 
learning innovations, by embedding a recent technology from NLP and ML to help 
students to learn how to argue independent of a human, time and location. 

Nevertheless, our study faces some limitations. First, our evaluation displays a 
proof-of-concept evaluation about the technology acceptance of our CA. We did not 
evaluate the influence of argumentation feedback through a CA on the actual 
argumentation skills. However, we believe cognitive dissonance theory might explain 


how adaptive feedback leads to the motivation to learn. Moreover, our evaluation is 
based on a rather small sample size. More participants in an evaluation are needed to 
strengthen the findings. Therefore, we call for future research to further investigate the 
potential of new conversation-based forms of human-computer interaction for adaptive 
skill learning. Empirical studies are needed to investigate the effects of new interactive 
learning tools on students’ skill level. Moreover, we did not investigate and evaluate 
specific design cues of a CA for adaptive argumentation feedback, since we wanted to 
provide a proof-of-concept study rather than a design science research approach. A 
more user-centered design approach, however, would be necessary to further 
investigate design parameters and design principles for adaptive conversation-based 
learning tools. 

All in all, we contribute to a new, unified approach for adaptive argumentative 
support of students by showing an exemplary use case for argumentation skill learning. 
Researcher can build on this to investigate new HCI use cases for other skills in which 
adaptive feedback might be necessary for formative learning in large-scale or distance 
learning scenarios. With further advances of NLP and ML, we hope our work will 
attract researchers to design more intelligent tutoring systems for other learning 
scenarios or metacognition skills and thus contribute to the OECD Learning framework 
2030 towards a metacognition-skill-based education. 
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